Predicting the Compositionality of Multiword Expressions Using Translations in Multiple Languages

نویسندگان

  • Bahar Salehi
  • Paul Cook
چکیده

In this paper, we propose a simple, languageindependent and highly effective method for predicting the degree of compositionality of multiword expressions (MWEs). We compare the translations of an MWE with the translations of its components, using a range of different languages and string similarity measures. We demonstrate the effectiveness of the method on two types of English MWEs: noun compounds and verb particle constructions. The results show that our approach is competitive with or superior to state-of-the-art methods over standard datasets. 1 Compositionality of MWEs A multiword expression (MWE) is any combination of words with lexical, syntactic or semantic idiosyncrasy (Sag et al., 2002; Baldwin and Kim, 2009), in that the properties of the MWE are not predictable from the component words. For example, with ad hoc, the fact that neither ad nor hoc are standalone English words, makes ad hoc a lexicallyidiosyncratic MWE; with shoot the breeze, on the other hand, we have semantic idiosyncrasy, as the meaning of “to chat” in usages such as It was good to shoot the breeze with you cannot be predicted from the meanings of the component words shoot and breeze. Semantic idiosyncrasy has been of particular interest to NLP researchers, with research on binary compositional/non-compositional MWE clasThe example is taken from http://www. thefreedictionary.com sification (Lin, 1999; Baldwin et al., 2003), or a three-way compositional/semi-compositional/noncompositional distinction (Fazly and Stevenson, 2007). There has also been research to suggest that MWEs span the entire continuum from full compositionality to full non-compositionality (McCarthy et al., 2003; Reddy et al., 2011). Investigating the degree of MWE compositionality has been shown to have applications in information retrieval and machine translation (Acosta et al., 2011; Venkatapathy and Joshi, 2006). As an example of an information retrieval system, if we were looking for documents relating to rat race (meaning “an exhausting routine that leaves no time for relaxation”), we would not be interested in documents on rodents. These results underline the need for methods for broad-coverage MWE compositionality prediction. In this research, we investigate the possibility of using an MWE’s translations in multiple languages to measure the degree of the MWE’s compositionality, and investigate how literal the semantics of each component is within the MWE. We use Panlex to translate the MWE and its components, and compare the translations of the MWE with the translations of its components using string similarity measures. The greater the string similarity, the more compositional the MWE is. Whereas past research on MWE compositionality has tended to be tailored to a specific MWE type (McCarthy et al., 2007; Kim and Baldwin, 2007; Fazly et al., 2009), our method is applicable to any MWE type in any language. Our experiments This definition is from WordNet 3.1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality

We predict the compositionality of multiword expressions using distributional similarity between each component word and the overall expression, based on translations into multiple languages. We evaluate the method over English noun compounds, English verb particle constructions and German noun compounds. We show that the estimation of compositionality is improved when using translations into m...

متن کامل

A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions

This paper presents the first attempt to use word embeddings to predict the compositionality of multiword expressions. We consider both singleand multi-prototype word embeddings. Experimental results show that, in combination with a back-off method based on string similarity, word embeddings outperform a method using count-based distributional similarity. Our best results are competitive with, ...

متن کامل

Compositionality And Multiword Expressions: Six Of One, Half A Dozen Of The Other?

In this talk, I will investigate the relationship between compositionality and multiword expressions, as part of which I will outline different approaches for formalising the notion of compositionality. I will then briefly review computational methods that have been proposed for modelling compositionality, and applications thereof. Finally, I will discuss possible future directions for modellin...

متن کامل

Determining the Semantic Compositionality of Croatian Multiword Expressions

A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Being able to automatically determine the semantic (non-)compositionality of MWEs is important for many natural language processing tasks. We address the task of determining the semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics...

متن کامل

Extracting Multiword Translations from Aligned Comparable Documents

Most previous attempts to identify translations of multiword expressions using comparable corpora relied on dictionaries of single words. The translation of a multiword was then constructed from the translations of its components. In contrast, in this work we try to determine the translation of a multiword unit by analyzing its contextual behaviour in aligned comparable documents, thereby not p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013